Contra Costa County
Evaluating Long-Context Reasoning in LLM-Based WebAgents
Chung, Andy, Zhang, Yichi, Lin, Kaixiang, Rawal, Aditya, Gao, Qiaozi, Chai, Joyce
As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. We develop a novel evaluation framework that simulates multi-session user interactions by injecting irrelevant task trajectories between dependent subtasks, creating contexts ranging from 25,000 to 150,000 tokens. Through extensive evaluation of four popular models, Claude-3.7, GPT-4.1, Llama 4, and o4-mini, we observe a dramatic performance degradation as context length increases, with success rates dropping from 40-50\% in baseline conditions to less than 10\% in long context scenarios. Our detailed error analysis reveals that agents primarily fail due to getting stuck in loops and losing track of original task objectives. We further propose an implicit RAG approach that provides modest improvements by generating task-relevant summaries, though fundamental limitations in long context reasoning persist. These findings highlight critical challenges for deploying WebAgents in realistic, long-term user interaction scenarios and provide insights for developing more robust agent architectures capable of maintaining coherent task execution across extended contexts.
- North America > The Bahamas (0.14)
- North America > United States > New York (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (11 more...)
- Workflow (0.93)
- Research Report > New Finding (0.93)
- Media (1.00)
- Consumer Products & Services (1.00)
- Transportation (0.93)
- Leisure & Entertainment > Sports > Basketball (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (6 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > New Hampshire > Rockingham County > Portsmouth (0.04)
- (8 more...)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (6 more...)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > New Hampshire > Rockingham County > Portsmouth (0.04)
- (8 more...)
US Investment in Spyware Is Skyrocketing
A new report warns that the number of US investors in powerful commercial spyware rose sharply in 2024 and names new countries linked to the dangerous technology. The United States has emerged as the largest investor in commercial spyware --a global industry that has enabled the covert surveillance of journalists, human rights defenders, politicians, diplomats, and others, posing grave threats to human rights and national security . In 2024, 20 new US-based spyware investors were identified, bringing the total number of American backers of this technology to 31. This growth has largely outpaced other major investing countries such as Israel, Italy, and the United Kingdom, according to a new report published today by the Atlantic Council. The study surveyed 561 entities across 46 countries between 1992 and 2024, identifying 34 new investors.
- Europe > Italy (0.36)
- Asia > Middle East > Israel (0.26)
- Europe > United Kingdom (0.25)
- (22 more...)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Trading (1.00)
- (3 more...)
Large Language Models as symbolic DNA of cultural dynamics
Pourdavood, Parham, Jacob, Michael, Deacon, Terrence
Although the recent wave of AI models, known as Large Language Models (LLMs), are seamlessly surpassing the Turing Test, this milestone has been overshadowed by their rapid commercialization and the profound ways they are already reshaping society. The pursuit of Artificial General Intelligence (AGI)--commonly defined as human-level intelligence--is touted as the next major milestone. Yet whether the continued progress within the current framework could ever lead to agency and meaning at the scale of AI itself remains an open and contested question. Critics argue that current LLMs operate through algorithmic mimicry, that is simulating intelligent behavior without embodying the principles behind it (Jaeger, 2024; Jaeger et al., 2024) . Artificial Neural Networks--the main framework behind LLMs--operate on behaviorist assumptions: a framework that focuses exclusively on observable input-output patterns while treating internal states as part of a "black box" to be optimized (Brooks, 1991; Sutton & Barto, 2015) . This does not mean LLMs do not have sophisticated engineering, but their structure is designed to optimize internal states based on input-output feedback loops. Even though the logic behind behaviorism is likely one of the key principles supporting an intelligent system, it likely is not sufficient for intelligence and is not what enables agency and intelligence in the first place (Dreyfus, 1992; Searle, 1980) . Furthermore, it would be naive to consider outward behavior of intelligence as having acquired intelligence or sentience since a good simulation can be powerful and convincing. To address such issues, alternative approaches grounded in organismal intelligence are emerging to instead explain the principles behind intelligence through intrinsic and goal-directed models of the body and its relationship to the environment (Deacon, 2012; Jacob, 2023; Jaeger et al., 2024; Levin, 2019; Roli et al., 2022; Varela et al., 1993; Watson, 2024) .
- North America > United States > California > San Francisco County > San Francisco (0.28)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > New York (0.04)
- (2 more...)
Personalized Artificial General Intelligence (AGI) via Neuroscience-Inspired Continuous Learning Systems
Gupta, Rajeev, Gupta, Suhani, Parikh, Ronak, Gupta, Divya, Javaheri, Amir, Shaktawat, Jairaj Singh
Artificial Intelligence has made remarkable advancements in recent years, primarily driven by increasingly large deep learning models. However, achieving true Artificial General Intelligence (AGI) demands fundamentally new architectures rather than merely scaling up existing models. Current approaches largely depend on expanding model parameters, which improves task-specific performance but falls short in enabling continuous, adaptable, and generalized learning. Achieving AGI capable of continuous learning and personalization on resource-constrained edge devices is an even bigger challenge. This paper reviews the state of continual learning and neuroscience-inspired AI, and proposes a novel architecture for Personalized AGI that integrates brain-like learning mechanisms for edge deployment. We review literature on continuous lifelong learning, catastrophic forgetting, and edge AI, and discuss key neuroscience principles of human learning, including Synaptic Pruning, Hebbian plasticity, Sparse Coding, and Dual Memory Systems, as inspirations for AI systems. Building on these insights, we outline an AI architecture that features complementary fast-and-slow learning modules, synaptic self-optimization, and memory-efficient model updates to support on-device lifelong adaptation. Conceptual diagrams of the proposed architecture and learning processes are provided. We address challenges such as catastrophic forgetting, memory efficiency, and system scalability, and present application scenarios for mobile AI assistants and embodied AI systems like humanoid robots. We conclude with key takeaways and future research directions toward truly continual, personalized AGI on the edge. While the architecture is theoretical, it synthesizes diverse findings and offers a roadmap for future implementation.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Alameda County > Dublin (0.04)
- North America > United States > New Jersey > Middlesex County > Edison (0.04)
- (2 more...)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
SynthFM: Training Modality-agnostic Foundation Models for Medical Image Segmentation without Real Medical Data
Sengupta, Sourya, Chakrabarty, Satrajit, Ravi, Keerthi Sravan, Avinash, Gopal, Soni, Ravi
SYNTHFM: TRAINING MODALITY -AGNOSTIC FOUNDA TION MODELS FOR MEDICAL IMAGE SEGMENT A TION WITHOUT REAL MEDICAL DA T A Sourya Sengupta 1, 2, Satrajit Chakrabarty 1, Keerthi Sravan Ravi 1, Gopal Avinash 1, Ravi Soni 1 1 GE HealthCare, San Ramon, CA, USA 2 University of Illinois Urbana-Champaign, Urbana, IL, USA ABSTRACT Foundation models like the Segment Anything Model (SAM) excel in zero-shot segmentation for natural images but struggle with medical image segmentation due to differences in texture, contrast, and noise. Annotating medical images is costly and requires domain expertise, limiting large-scale annotated data availability. To address this, we propose Syn-thFM, a synthetic data generation framework that mimics the complexities of medical images, enabling foundation models to adapt without real medical data. Using SAM's pretrained encoder and training the decoder from scratch on SynthFM's dataset, we evaluated our method on 11 anatomical structures across 9 datasets (CT, MRI, and Ultrasound). SynthFM outperformed zero-shot baselines like SAM and MedSAM, achieving superior results under different prompt settings and on out-of-distribution datasets.
- North America > United States > Illinois > Champaign County > Urbana (0.54)
- North America > United States > California > Contra Costa County > San Ramon (0.24)
SAMRI-2: A Memory-based Model for Cartilage and Meniscus Segmentation in 3D MRIs of the Knee Joint
Ferreira, Danielle L., Nunes, Bruno A. A., Zhang, Xuzhe, Gomez, Laura Carretero, Fung, Maggie, Soni, Ravi
Accurate morphometric assessment of cartilage-such as thickness/volume-via MRI is essential for monitoring knee osteoarthritis. Segmenting cartilage remains challenging and dependent on extensive expert-annotated datasets, which are heavily subjected to inter-reader variability. Recent advancements in Visual Foundational Models (VFM), especially memory-based approaches, offer opportunities for improving generalizability and robustness. This study introduces a deep learning (DL) method for cartilage and meniscus segmentation from 3D MRIs using interactive, memory-based VFMs. To improve spatial awareness and convergence, we incorporated a Hybrid Shuffling Strategy (HSS) during training and applied a segmentation mask propagation technique to enhance annotation efficiency. We trained four AI models-a CNN-based 3D-VNet, two automatic transformer-based models (SaMRI2D and SaMRI3D), and a transformer-based promptable memory-based VFM (SAMRI-2)-on 3D knee MRIs from 270 patients using public and internal datasets and evaluated on 57 external cases, including multi-radiologist annotations and different data acquisitions. Model performance was assessed against reference standards using Dice Score (DSC) and Intersection over Union (IoU), with additional morphometric evaluations to further quantify segmentation accuracy. SAMRI-2 model, trained with HSS, outperformed all other models, achieving an average DSC improvement of 5 points, with a peak improvement of 12 points for tibial cartilage. It also demonstrated the lowest cartilage thickness errors, reducing discrepancies by up to threefold. Notably, SAMRI-2 maintained high performance with as few as three user clicks per volume, reducing annotation effort while ensuring anatomical precision. This memory-based VFM with spatial awareness offers a novel approach for reliable AI-assisted knee MRI segmentation, advancing DL in musculoskeletal imaging.
- North America > United States > California > Contra Costa County > San Ramon (0.14)
- North America > United States > New York (0.04)
- North America > United States > Wisconsin > Waukesha County > Waukesha (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)